Guild icon
S3Drive
Community / general / Object lock / enterprise
Avatar
Markus Berthold 6/30/2023 11:50 AM
Seems like the web client has the same issue:
11:50 AM
No matter whether an object with the same name has existed before. (edited)
Avatar
Currently it's Discord, support@s3drive.app or https://github.com/s3drive/app/issues Since we already have it here, it's entirely fine, except it normally would be at our (https://discord.com/channels/1069654792902815845/1102236355645419550) channel. (edited)
11:51 AM
The web client is run from the same codebase, so behavior will usually be the same.
11:52 AM
What would be the expectation? I guess proper error message. In other words instead of 400 HTTP code we shall display the issue related to Object Lock / Compliance mode, am I right?
11:53 AM
You've also mentioned that no subsequent upload works. Do you mean that no new file can be uploaded since this error, even if the new upload shouldn't fail because of the compliance settings?
Avatar
Avatar
Tom
What would be the expectation? I guess proper error message. In other words instead of 400 HTTP code we shall display the issue related to Object Lock / Compliance mode, am I right?
Markus Berthold 6/30/2023 12:11 PM
I have enabled the governance mode for the bucket. You can test on your own, I will send you the details in a direct message.
Avatar
Avatar
Markus Berthold
I have enabled the governance mode for the bucket. You can test on your own, I will send you the details in a direct message.
That's great, if you could send me the details then it will be helpful in understanding the issue (edited)
Avatar
Thanks for the details, I've connected and got the issue: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters</Message><RequestId>...</RequestId><HostId>...</HostId></Error> I remember we've solved this issue in S3Drive's predecessor: https://play.google.com/store/apps/details?id=com.photosync.s3 but this didn't end up in S3Drive just yet. Fix: https://github.com/s3drive/app/issues/16#issuecomment-1257024140 In other words we need to add this header if compliance mode is enabled, but since we don't want to do it by default we'll likely add the configurable setting, which will get switched on automatically if we detect this error message. (edited)
12:24 PM
Unless you have better idea how we could address that, that's the path we would likely take.
Avatar
There is a challenge that we need to think how to address, which is related to E2E encryption. In order to calculate the object MD5, we need to know the object contents... but with E2E encryption enabled, MD5 will have to be calculated from the encrypted content, which isn't available (because it's streamed) the moment where we send the initial HTTP requests. (edited)
Avatar
Markus Berthold 6/30/2023 12:59 PM
Why do you need the unencrypted MD5 hash?
1:02 PM
Or store additional properties in object metadata?
Avatar
https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock-overview.html "If you configure a default retention period on a bucket, requests to upload objects in such a bucket must include the Content-MD5 header. For more information, see Put Object in the Amazon Simple Storage Service API Reference. " https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html "The Content-MD5 header is required for any request to upload an object with a retention period configured using Amazon S3 Object Lock. For more information about Amazon S3 Object Lock, see Amazon S3 Object Lock Overview in the Amazon S3 User Guide. " Basically we would need to provide object's MD5 ahead of sending any data. With unencrypted data that's fine, we can calculate MD5 locally and then send the HTTP request. With the client-side encrypted data, this is more complicated, since we can't predict how the encrypted content would look like. We could for instance encrypt the object to local FS as a temporary file, then calculate MD5, then stream the encrypted version and then delete a temporary file. We could also do the same in-memory (wouldn't work for big files though). This has certain consequences and complexity though. (edited)
1:13 PM
I am not sure if it's feasible, but we actually could follow the above mentioned workaround (streaming file before sending the HTTP request) for files < 5MB and for files >= 5MB we could use: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html API which doesn't seem to require Content-MD5. This API isn't available for files < 5MB, that's why for small files we would have to do it differently. We need to test if this workaround would work, because if it does, this feature would fit nicely in a work that we're doing at the moment: https://s3drive.canny.io/feature-requests/p/multipart-uploads-for-bigger-files (edited)
Avatar
Markus Berthold 6/30/2023 1:41 PM
Now I know what you mean. But still all calculations must be done locally. I think MultipartUpload is the preferred way for bigger files. For file < 5MB the encryption +md5 hash could be calculated in memory IMHO.
1:45 PM
I have a question about your E2E encryption: At the moment it looks like you are using a 128 Bit Master Key. What are your plans if you need to improve the E2E? In the future you maybe need to support different encryption types with different key length. This information should/could be part of the "key" which is presented or could be entered.
Avatar
Markus Berthold 6/30/2023 1:53 PM
Also for object lock, what are your plans for specifying an individual retention time?
Avatar
Avatar
Markus Berthold
Now I know what you mean. But still all calculations must be done locally. I think MultipartUpload is the preferred way for bigger files. For file < 5MB the encryption +md5 hash could be calculated in memory IMHO.
The UploadPart API to upload individual chunks still enforces the Content-MD5 though: https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html so chunk size would still have to manageable size for in-memory encryption. If we've used min. size which is 5MB that's fine (memory-wise), except it would limit the upper file size limit to 10k (max chunks) * 5MB = 50GB which in some cases is too small. https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html We'll likely introduce setting with the preferred size which either user could control or/and would be reasonably pre-filled based on e.g. desktop/mobile or RAM amount.
Avatar
Markus Berthold 6/30/2023 2:54 PM
Sounds good.
2:56 PM
The Multi upload also would have the advantage that it might be faster. In my tests I only was able to utilize 8 Mbit/s per upload (with E2E). When multiple chunks are used (and you have multiple cpu cores) that might also be an improvement.
Avatar
Avatar
Markus Berthold
I have a question about your E2E encryption: At the moment it looks like you are using a 128 Bit Master Key. What are your plans if you need to improve the E2E? In the future you maybe need to support different encryption types with different key length. This information should/could be part of the "key" which is presented or could be entered.
Current E2E protocol based on AES-GCM is our 2nd revision. Prior to that we supported AES-CBC. There was a transitionary period where we supported both protocols. We could distinct which cipher to apply, because all of ther required encryption scheme information is part of object's metadata (https://s3drive.app/images/aes_encryption.png) Next month or so we will be releasing improvements to our encryption scheme which will use STREAM protocol combined with AES-GCM: https://s3drive.canny.io/feature-requests/p/implement-chunked-encryption-using-stream-protocol This change will reflect the object's metadata. New objects will use new encryption scheme, where as existing objects will remain readable as they are currently. The fact that we use metadata to keep information about encryption makes us pretty flexible if we ever need to improve E2E encryption.
Avatar
Avatar
Markus Berthold
I have a question about your E2E encryption: At the moment it looks like you are using a 128 Bit Master Key. What are your plans if you need to improve the E2E? In the future you maybe need to support different encryption types with different key length. This information should/could be part of the "key" which is presented or could be entered.
Master Key is currently 128 bit, however that's the KEK that encrypts CEK which is 256 bit. If we ever need to increase KEK size, that's fine.
Avatar
Markus Berthold 6/30/2023 3:02 PM
I have not had a look at the object properties but your approach is good.
Avatar
Avatar
Markus Berthold
I have not had a look at the object properties but your approach is good.
That's the approach that AWS used in their encryption scheme, so we've followed the same. This isn't the most universal approach, as there are some storages which do not support metadata, but since we're pretty fixed on S3, that's not an issue for us. (edited)
Avatar
Markus Berthold 6/30/2023 3:05 PM
At least from my point of view that's enough. And if enything is (or was) part of the protocol it's not so easy to remove it from the protocol, so I would say it should be safe.
3:06 PM
All storages which claims to use S3 should be compatible with it.
Avatar
Avatar
Markus Berthold
The Multi upload also would have the advantage that it might be faster. In my tests I only was able to utilize 8 Mbit/s per upload (with E2E). When multiple chunks are used (and you have multiple cpu cores) that might also be an improvement.
Exactly as you say, there is a room for improvements in that area. With or without multipart upload we've been designing things, so parallelization is possible. For instance with AES-CBC you've had to encrypt "block by block", with AES-GCM you can parallelize encryption, but you would have to find pretty smart implementation or build your own that would allow you to do that. With recent STREAM (https://github.com/miscreant/meta/wiki/STREAM), we have the internal "chunks counter" and have more control over the encryption process (including breaking down encryption blob in to manageable chunks) without losing any of its properties (at least predictable size, random access, authentication, parallel encryption/decryption, reordering protection, truncation protection). (edited)
Avatar
Avatar
Markus Berthold
The Multi upload also would have the advantage that it might be faster. In my tests I only was able to utilize 8 Mbit/s per upload (with E2E). When multiple chunks are used (and you have multiple cpu cores) that might also be an improvement.
The only limit is the sky and our technical ability to improve the clients parallel operations (applicable also for other operations, e.g. copy / rename), but this will eventually come. (edited)
Avatar
Markus Berthold 6/30/2023 3:12 PM
I understand that at the moment object lock is not possible because of the missing md5 processing. I know it is hard to say but what is your estimation about the above? For the project I mentioned object lock and E2E are two things which are essential.
Avatar
Avatar
Markus Berthold
I understand that at the moment object lock is not possible because of the missing md5 processing. I know it is hard to say but what is your estimation about the above? For the project I mentioned object lock and E2E are two things which are essential.
Combined with E2E it's slightly more complex than I've initially anticipated. It would make sense to implement it together with the Multipart upload. Non binding estimate would be that mid-July we shall have a prototype, which will certainly require tweaks, testing. If things go well (that is no more surprises to the protocol requirements) by the end of July this shall land on production. Does it sound reasonable? (edited)
👍🏻 1
3:21 PM
We could workaround just for the sake of completeness and do it without Multipart upload (the local-file solution that I've mentioned). This would speed things up, but it's a double work for us (because ultimately it's a subpar solution which will require replacement anyway). With Multipart upload things are slightly longer, because we also need to combine it with the new encryption scheme release (in other words we don't want to fight with current non-streamable AES-GCM encryption combined with Multipart Upload). (edited)
Avatar
Avatar
Markus Berthold
The Multi upload also would have the advantage that it might be faster. In my tests I only was able to utilize 8 Mbit/s per upload (with E2E). When multiple chunks are used (and you have multiple cpu cores) that might also be an improvement.
8 Mbit/s is rather slow. I guess you might've tried file which is bigger than 100MB? Unfortunately this falls back to software encryption which is xx times slower. This will change once we release STREAM protocol next month, which will make hardware encryption apply for all files sizes. (edited)
Avatar
Avatar
Tom
8 Mbit/s is rather slow. I guess you might've tried file which is bigger than 100MB? Unfortunately this falls back to software encryption which is xx times slower. This will change once we release STREAM protocol next month, which will make hardware encryption apply for all files sizes. (edited)
Markus Berthold 6/30/2023 7:09 PM
I made an test with 1 GB and this was the result. The smaller file sizes (10 MB) were realy quick.
Avatar
Avatar
Markus Berthold
I made an test with 1 GB and this was the result. The smaller file sizes (10 MB) were realy quick.
Great to hear, anything below <100MB would usually be limited by a network speed, as hardware AES-GCM is pretty fast on a modern CPU. There are also 5 upload isolates, so if user uploads multiple files the load shall distribute evenly between 5 threads. (edited)
Avatar
Hi @Markus Berthold I've just wanted to let you know that we've did some research and making good progress on the prototype. We shall have a beta version of S3Drive using new performant encryption scheme, supporting object lock, multipart upload and drive mount (on Windows, Mac and Linux) by the end of next week. Stabilizing these features would take us to the end of July, but we could certainly provide the (beta) build for you to have look. You've also mentioned possibility of setting up different passwords for different folders. We can certainly achieve it by implementing "vault" approach in a similar way as "rclone crypt" (https://rclone.org/crypt/). It's unlikely we will be able to make progress on that this month, but depending on the severity we could prioritize that over our other roadmap items. There are couple challenges that we're facing and we will have to decide to choose compatibility vs performance/security. In summary, we would like to use Rclone's disk mount (for its Windows support and pretty good caching settings), but we would have to make our cipher compatible with Rclone's which is subpar (no truncation protection, no key separation, not performant cipher (XSalsa20, 64KB chunks) for Web) comparing to what we've planned to release with STREAM approach. The other option would be to modify Rclone's to support our scheme, but then we lose compatibility that we could've accidentally gained and then we're ending up with a fork that we would have to maintain. Most realistically, in the short-term we could align with Rclone's ciphers even though it has some implications and web performance won't be great. In next stage we could squezze this model to the limit, perhaps we could liaise with Rclone maintainer's to align certain things, so we don't need to diverge at all. If this doesn't work we could then fork, upgrading cipher and maintaining compatbility with the older cipher is also easy, since Rclone's header contains the cipher versioning.
1:21 PM
If you've some comments on this, that would be great, otherwise we're pushing with these changes... this will benefit anyone including companies and end-users. Did you find any other roadblocks with S3Drive? (edited)
Avatar
Markus Berthold 7/4/2023 1:35 PM
I have not had the chance for further tests. I hope I can continue by end of the week.
1:40 PM
Regarding the project: We propably have an appointment with the server manufacturer which has involved us in the project in the next weeks. So it might be the project will be delayed which would be good for us. You don't have to rush with the folder topic. Maybe we can workaround when we get in contact with the customer. The E2E together with object lock is of course still be an important topic.
Avatar
Thanks for letting me know. Speaking of E2EE, how important is the filename / filepath encryption? It would also be good to know to understand what are the requirements for the file sharing. Would files be shared within organisation or externally? There is an item which we need to execute to further improve security: https://s3drive.canny.io/feature-requests/p/secure-encrypted-sharing the challenge is that if we've applied the Rclone's cipher then our "Secure sharing" idea no longer works. (edited)
Avatar
Hi @Markus Berthold, quick update from our side. Things went smoother than expected, so we've also decided to build and include filepath encryption in the next major release. We're pushing hard to release all of the improvements, including object lock support, drive mount (including Windows) and multipart upload as early as next week. If there is anything you would like to discuss I am available for a chat. (edited)
Avatar
Markus Berthold 7/13/2023 10:24 AM
Hi Tom, I am on a business trip this week abd very limited in time. I am interested to do further testing with object lock when it's available. How are your plans regarding the object lock retention configuration settings?
Avatar
Avatar
Markus Berthold
Hi Tom, I am on a business trip this week abd very limited in time. I am interested to do further testing with object lock when it's available. How are your plans regarding the object lock retention configuration settings?
We were mostly focused on technical solution in order to make Object Lock possible at all with E2E encryption given the requirements of MD5 hash generation. As such our testing environment had default retention configuration applied to bucket globally. We haven't really thought how to configure object lock mode from within the S3Drive, but since S3 provides these headers: x-amz-object-lock-mode: ObjectLockMode The Object Lock mode that you want to apply to this object. Valid Values: GOVERNANCE | COMPLIANCE x-amz-object-lock-retain-until-date: ObjectLockRetainUntilDate The date and time when you want this object's Object Lock to expire. Must be formatted as a timestamp parameter. x-amz-object-lock-legal-hold: ObjectLockLegalHoldStatus Specifies whether a legal hold will be applied to this object. For more information about S3 Object Lock, see Object Lock. Valid Values: ON | OFF It's a matter of providing sane settings UI where these settings can be applied. Depending on the requirements there could be multiple layers with override rules. For instance user could specify settings on the bucket level which would then be overridden by the settings on the folder level, then on the sub-folder level (and so on) down until the file level. We're open for suggestions how this should/could work.
Avatar
Markus Berthold 7/13/2023 3:26 PM
Your suggestion with default and folder override sounds very good.
Avatar
Hi @Markus Berthold, We've finally released things out, feel free to try out our recent major release: https://s3drive.app/changelog We're still resolving issues with Apple regarding the iOS and macOS releases, judging by the sluggish communication I would expect it would take couple more days. In principle you can use current iOS (1.4.0), but on macOS there is only unsigned .DMG version available which won't persist S3 credentials to Keychain properly. I would be glad to know if you find it an improvement and whether recent features satisfy your use case. In the meantime we're monitoring the release and working on couple improvements already.
Avatar
Markus Berthold 7/26/2023 11:53 AM
Hi Tom, I hope I will find the time to test everything in the near future. Unfortunately I am very busy at the moment. Regarding the project I talked with you about: From a contact I have heard that a competitor offered the following solution: https://teamdrive.com/en/ So far I had no time to check the product. Maybe it could be interesting for you to know your competitors.
Avatar
Hi Markus, From what I've initially understood, Teamdrive is more like a closed-source version of Nextcloud and is pretty well-established business / product which offers quite rich set of collaboration tools including Office suite. With Teamdrive this comes at the cost of managing multiple moving parts, components: https://teamdrive.com/en/components-and-modules#enterprise and being responsible for multiple Docker, Apache, MySQL deployments, maintenance, updates and availability. Even the web client itself needs separate: "TeamDrive Webportal Server". This all needs to be handled on top of the S3 service that you either buy from 3rd party or host yourself. For some companies it's not a big deal, for some it's a deal breaker. Our idea behind S3Drive is entirely different. Our aim is to deliver value without putting on user burden of hosting anything else besides the S3. There is abundance of hosting providers which sell S3 as a service. Our web client is self-hosted and doesn't needs anything else than a modern browser. This comes at the huge challenge of building around the limitations of S3 protocol itself, but that's on us. Yes, there are many features which will never be as efficient on our side comparing to client<>server<>S3 architecture, but that's the price that we pay to stay compliant with the protocol. Some features with S3Drive might require external connection (e.g. cross-client sync) beside the S3, but our plan is to still use a open protocol (e.g. AMQP), so you can decide if you want to host the queue exchange yourself or buy AMQP as a service. We're all about protocols and recent move with 1:1 Rclone compatibility is another proof. TeamDrive is a different beast and we're not as feature rich, but if one needs pretty basic file management, backup, sync, that's where we shine and we can offer that for half the price. I do appreciate the link though and if you have more alternatives, I would be happy to digest, always good to learn ! (edited)
Tom changed the channel name: Object lock / enterprise 7/31/2023 5:33 AM
Exported 50 message(s)
Timezone: UTC+0